-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Stop using api_core default timeouts in publish since they are broken #1326
Conversation
3125634
to
bd6ef26
Compare
c527c2e
to
82d69ca
Compare
@beltran PTAL |
google/cloud/pubsub_v1/types.py
Outdated
# value results in retries with zero deadline. | ||
# Refer https://github.com/googleapis/python-api-core/issues/654 | ||
timeout: "OptionalTimeout" = ExponentialTimeout( | ||
initial=5, maximum=60, multiplier=1.3, deadline=600 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little nervous about changing the default behavior from 60s to 5s for the initial request. Prior to the change that resulted in unexpected backoff behavior, what was the initial value? If it was 5, this could be considered a return to the "correct" value, but if this is the first time we are introducing it as 5, I fear we could break behavior for people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prior to the change that resulted in unexpected backoff behavior, what was the initial value?
Initial value = 60s
Pre api_core change:
initial RPC value = 60s
retry RPC value = 60s
total timeout across retries = 600s
Post api_core change:
initial RPC value = 60s
retry RPC value = 0s
total timeout across retries = 600s
Post this PR:
Initial RPC value = 5s
retry RPC value = [5, 60] with multiplier=1.3
total timeout across retries = 600s
but if this is the first time we are introducing it as 5, I fear we could break behavior for people.
- This is the first time we have 5s as the initial RPC value and yes, this is a breaking change.
- Hypothetically, even if the initial value was 5s(it isn't), then changed to 60s in 2022, I'm not sure how much weight we must give to the fact that it was 5s initially, since its been a few years with this new behavior, i.e, this would still be a breaking change for those who onboarded between 2022 and now.
Code / Links for easy reference
- Both pre and post api_core change, we passed in
default_timeout=60
from Pub/Sub to api_core library. - Pre api_core change, this was interpreted as ConstantTimeout(60)
- Post api_core change, this is interpreted as TimeToDeadlineTimeout(60)
- This interpretation of
default_timeout=60.0
in api_core happens here - Post PR change, timeout decorator applied would be ExponentialTimeout instead of TimeToDeadlineTimeout(60)
- Definition of different timeout options can be found here
82d69ca
to
6e0f8a2
Compare
Change in behavior
Default publish timeout behavior:
Pre-change: 60s timeout for initial RPC, 0s timeout per retry, 600s total timeout
Post-change:60s timeout for initial RPC, 60s timeout per retry, 600s total timeout
Change
Only non-test / non sample(i.e functional) usage found for
PublisherOptions.timeout
is here:python-pubsub/google/cloud/pubsub_v1/publisher/client.py
Lines 440 to 441 in f648f65
This change in usage is intended.
Call stack:
python-pubsub/google/cloud/pubsub_v1/publisher/client.py
Line 299 in f648f65
invokes one of the sequencers(ordered or unordered)
python-pubsub/google/cloud/pubsub_v1/publisher/client.py
Lines 477 to 479 in f648f65
which then create a batch with this timeout
python-pubsub/google/cloud/pubsub_v1/publisher/_sequencer/ordered_sequencer.py
Lines 255 to 263 in f648f65
which uses it when committing the batch / publishing the messages:
python-pubsub/google/cloud/pubsub_v1/publisher/_batch/thread.py
Lines 323 to 328 in f648f65
and invokes gapic_publish:
python-pubsub/google/cloud/pubsub_v1/publisher/client.py
Lines 289 to 291 in f648f65
which then uses it in the RPC call:
python-pubsub/google/pubsub_v1/services/publisher/client.py
Lines 1067 to 1072 in f648f65
that is obtained by wrapping the transport base publish call:
python-pubsub/google/pubsub_v1/services/publisher/client.py
Line 1055 in f648f65
The wrapper uses the timeout when invoking the GapicCallable:
https://github.com/googleapis/python-api-core/blob/b1fae31c8c71ed65ad22a415dab2b54720c3d4ba/google/api_core/gapic_v1/method.py#L245-L252
which currently overrides float and int timeouts to use TimeToDeadlineTimeout:
https://github.com/googleapis/python-api-core/blob/b1fae31c8c71ed65ad22a415dab2b54720c3d4ba/google/api_core/gapic_v1/method.py#L112-L113
which is incorrect(why its incorrect is mentioned in the attached bug)
Fixes #1325 🦕